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AVERAGE OPTIMALITY FOR RISK-SENSITIVE CONTROL WITH 
GENERAL STATE SPACE 1 

By Anna Jaskiewicz 

Wroclaw University of Technology 

This paper deals with discrete-time Markov control processes on 
a general state space. A long-run risk-sensitive average cost crite- 
rion is used as a performance measure. The one-step cost function is 
nonnegative and possibly unbounded. Using the vanishing discount 
factor approach, the optimality inequality and an optimal stationary 
strategy for the decision maker are established. 

1. Introduction and the model. This paper deals with discrete-time 
Markov control processes on a general state space. The one-step cost function 
is nonnegative and possibly unbounded. The decision maker is supposed to 
be risk-averse with a constant risk coefficient 7 > 0. The risk-sensitive aver- 
age cost criterion is used as a performance measure. The aim of the work is to 
establish the optimality inequality for risk-sensitive dynamic programming 
and derive an optimal stationary policy. The result is proved under two 
different sets of compactness-continuity assumptions, namely, for Markov 
control processes with weakly continuous transition probabilities [Condition 
(W)], as well as transition probabilities that are continuous with respect 
to setwise convergence [Condition (S)]. A similar problem for risk-neutral 
stochastic control models has been examined in [27] using the vanishing dis- 
count factor approach. However, it is well known that, for risk-sensitive con- 
trol models, an analogous approximation of the average cost via a sequence 
of the corresponding discounted models does not work. Instead of this, fol- 
lowing [9, 15, 16], we introduce an auxiliary discounted minimax problem. 
A variational formula that expresses the mutual relationship between the 
relative entropy function and the logarithmic moment-generating function 
enables us to connect the discounted minimax model with the original one. 
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Next, assuming that a certain family of functions is bounded [Condition (B)] 
and using Fatou's lemma (for weakly or setwise convergent measures), we 
obtain the optimality inequality. 

The predecessor of our result is Theorem 4.1 in [16], where the optimality 
inequality for the risk-sensitive dynamic programming with a countable state 
space was established. Instead of boundedness assumption (B), Hernandez- 
Hernandez and Marcus [16] assume that there exists a stationary policy 
which induces a finite average cost that is equal some constant in each 
state. On the other hand, it is well known that an optimal risk-sensitive 
average cost may depend on the initial state (see Example 1). This behavior 
happens if the risk factor is too large. Instead of this restriction on the 
risk coefficient, we use Condition (B), which makes the process reach "good 
states" sufficiently fast. 

There is a rich literature in risk-sensitive control, going back at least to 
the seminal works of Howard and Matheson [18] and Jacobson [19], which 
covered the finite horizon case. The average cost criterion on the infinite 
horizon was studied in [5, 8, 14, 15, 16, 31] for a denumerable state space 
and in [10, 11, 20] for a general state space. It is also worth mentioning 
that risk-sensitive control finds natural applications in portfolio managment, 
where the objective is to maximize the growth rate of the expected utility 
of wealth; see [3, 4, 30] and the references cited therein. 

The paper is organized as follows. Below a Markov control model with 
the long-run average cost criterion as a performance measure is described, as 
well as some basic notation is set up. In Section 2 we introduce preliminaries 
and present the auxiliary discounted minimax problem, which is, in turn, 
solved in Section 3. The main result is established in Section 4. Section 5 
contains a discussion of Condition (B), and in the Appendix a variational 
formula for the logarithmic moment-generating function is stated. 

A discrete-time Markov control process is specified by the following ob- 
jects: 

(i) The state space X is a standard Borel space (i.e., a nonempty Borel 
subset of some Polish space). 

(ii) A is a Borel action space. 

(iii) K is a nonempty Borel subset of X xA. We assume that, for each 
the nonempty x-section 

A(x) = {a G A:(x,a) € K} 

of K is compact and represents the set of actions available in state x. 

(iv) q is a regular conditional distribution from K to X. 

(v) The one-step cost function c is a Borel measurable mapping from K 
to [0,+oo]. 
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Then the history spaces are defined as H = X, H k = (X x A) k x X and 
= {X x A)°° . As usual, a policy tt = {ir^, k = 0, 1, . . .} G II is a sequence 
of transition probabilities from H k to A such that TTk{A{x k )\hk) = 1, where 
^-fc = (xq, a,Q, . . . , Xf~) 6 -fffc. The class of stationary policies is identified with 
the class F of measurable functions / from X to A such that f(x) £ It 
is well known that F is nonempty [6]. By the Ionescu-Tulcea theorem [24], 
for each policy tt and each initial state xq = x, a probability measure P£ 
and a stochastic process {(x/o,afc)} are defined on in a canonical way, 
where x k and describe the state and the decision at stage k, respectively. 
By E£ we denote the expectation operator with respect to the probability 
measure P£. 

Let 7 > be a given risk factor. For any initial state x € X and policy 
tt S IT, we define the following risk-sensitive average cost criterion: 

J(x,7r) = limsup — log££exp< 7 c(x k ,a k ) >. 



n^oo 7?1 



fc=0 



Our aim is to minimize J{x,tt) within the class of all policies and find a 
policy tt* , for which 

J*(x) := inf J(x, tt) = J(x, it*). 

7rSiI 

Throughout the paper the following assumption will be supposed to hold 
true even without explicit reference: 

(G) 37feII J(x,7f) <+oo. 



Remark 1. Throughout the remainder, we assume that the risk factor 
7 > is arbitrary and fixed. Therefore, here and subsequently, we shall not 
indicate that some quantities depend on 7 [e.g., we write J{x,tt) instead of 
J 7 (x,7r), dropping the index 7]. 

2. Preliminaries. Let Pr(X) be the set of all probability measures on 
X. Fix v £ Pr(X). The relative entropy function is a mapping from 

Pr(X) into R defined as follows: 

[ +00, otherwise. 

It is well known that R(fi\\u) is nonnegative for any \x € Pr(X) and R{ijl\\u) = 
if and only if fi = v (consult Lemma 1.4.1 in [12]). 

We shall consider the following auxiliary minimax problem, associated 
with our original Markov control process. The set X is the state space, 
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while A and Pr(X) are the action sets for the decision maker and op- 
ponent, respectively. The process then operates as follows. In a state x n , 
n = 0, 1, . . . , the controller chooses an action a n E A(x n ), while the oppo- 
nent selects fi n (-)[x n , a n ] E Pr(X). As a consequence, the controller pays 
'jc(x n ,a n ) — R(fi n \\q(-\x n , a n )) to his opponent, and the system moves to the 
next state according to the probability distribution fi n (-)[x n ,a n ]. 

We shall deal with the following classes of strategies. It will cause no 
confusion if we continue to use the same letters to denote strategies for 
the controller. Namely, tt stands for a randomized control strategy (policy), 
whereas / denotes a stationary strategy. We write II and F to denote the sets 
of corresponding strategies. For the opponent's class of strategies, we confine 
to the stationary one, which is identified with the class P of stochastic kernels 
p on X given K. 

Let (O, J-) be the measurable space consisting of the sample space Vt = 
{X x A)°° and its product <r-algebra T . Then for an initial state 
and strategies ir and p, there exists a unique probability measure V% p and, 
again, a stochastic process {(i^a/t)} is defined JF) in a canonical way, 
where x k denotes the state at time k and a k is the action for the controller. 
With some abuse of notation, we let h k stand for the history of the process 
up to the kth state, that is, 

h k = (x , a ,xi,..., a k _i,x k ). 

The corresponding expectation operator is denoted by ££ p . 

For fixed x E X, tt £ II and p £ P, we define the following functional 
costs: 

oo 

(1) Vp(x,ir,p) = (3 k £x p [yc(x k ,a k ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))], 

k=0 

where ft £ (0, 1) is the discount factor, and 

n— 1 

j(x,7r,p) =limsup — V^ p [7c(x fc ,a fc ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))]. 

Note that, since the function is lower semicontinuous on Pr(A") x 

Pr(X) andp and q are stochastic kernels [i.e., measurable functions of (x, a)], 
it follows that the mapping 

(x, a) i— > R(p(-\x, a) \\q(-\x, a)) 

is measurable (Lemma 1.4.3(f) in [12]). Observe that Vp(x,Tr,p) and j(x,7r,p) 
might be undetermined, because c can be unbounded. We thus restrict the 
set of admissible strategies for the opponent in the following way. 
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Definition 1. Given tt = {7r k } € II, we say that p € P is a 7r-admissible 
strategy iff 

(2) / R(p(-\x k ,a)\\q(-\x k ,a))-K k (da\h k ) < +00, 
JA{x k ) 

and moreover, there exists a constant C > 0, possibly depending on it and 
p, such that 

/ [7c(x fe ,a) - R(p(-\x k ,a)\\q(-\x k ,a))]ir k (da\h k ) + C > 0, 

for all histories of the process h k , k > 0, induced by p and 7r. We denote 
this set by Q(tt). [Note that this set is nonempty, since p=gG Q(^) for any 

Tren.] 

Let us introduce the following notation. For any ir € II, p £ Q(7r) and 
n > 1, define 

(3) J n (x,7r) = \ogE^expi^^2c(x k ,a k ) L 

I fe=o J 

and 

n-1 

j n (x,7r,p) = £x P h c ( x k,a k ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))]. 

k=0 

Now we are ready to present the result that was originally proved in 
[16] for Markov strategies. However, it still remains valid when arbitrary 
strategies for the decision maker are considered. Therefore, for the sake of 
clarity, we state the result with its proof. 

Proposition 1. LetxeX and p e Q(ir). Then: 

(a) sup p6Q(7r) j n (x,ir,p) < J n (x,vr) for each n > 1, 

(b) limsup^^suppgQ^) ±j n (x,ir,p) <i/J(x,ir). 

Proof, (a) Let p G Q(tt) be any stochastic kernel. For n = 1, we con- 
clude 

h(x,7r,p) < £^( 7 c(x,a )) < log^e***.*)) = J^.tt), 

where the first inequality holds since the relative entropy is nonnegative, and 
the second one is due to Jensen's inequality. Now assume that the hypothesis 
is true for some n > 1 . Clearly, 

n 

j n+1 (x,7r,p) = ^2£^ p [-/c(x k ,a k ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))} 

k=0 

II 

= 8l p ^[^c(x k ,a k ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))], n> 1. 

k=0 
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Denote by vr^ 1 ) the "1-shifted" strategy, that is, 

TTfc C"!' 1 '*:) =7T k+i(-\xo,a ,h k ), k>0. 

Then, we have 
j n+ i(x,ir,p) 

= £x p hc{x,a ) + j n (x 1 ,7r {1 \p) - R(p(-\x,a )\\q(-\x,a ))] 



JA(x) Jx 
= Jn+l(x,Tt). 

Clearly, the first inequality follows from the induction hypothesis. The third 
inequality is due to Jensen's inequality, whilst the second one follows from 
Lemma A in the Appendix. Since p G Q(ir) is arbitrary, we get the desired 
conclusion. 

Part (b) follows directly from part (a). □ 

Remark 2. Note that in the proof of Proposition 1 we did not really 
have to use the fact that p G Q(vr). The only assumption which plays an 
essential role is condition (2). Namely, it guarantees that j n (x,7r,p) is well 
defined for all n > 1, x £ X and tt G II. However, in Definition 1 we restrict 
the opponent's class of strategies to the set Q(tt) in order to be able to apply 
the Hardy-Littlewood theorem. In actual fact, later on it will be clear that 
the set Q(tt), where tt G II, is sufficiently large. Namely, the supremum of 
certain discounted functional costs over the set Q(tt) will not change if we 
add new elements to Q(tt); see the proofs of Lemmas 1 and 2. 



<^( 7 c(x,ao)) 

+ S^(S^{[Mx 1 M 1) ) - R(p(-\x,a )\\q(-\x,a ))]\a }) 
= ££loge 7c(x ' ao) 
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Let 7r be as in assumption (G) and let p 6 Q(tt). Then from the Hardy- 
Littlewood theorem (Theorem H.2 in [13]), we get 

limsup(l - /3)V/3(x,TT,p) < limsup -j n (x,7C,p) 

/3— >1 n^oo Tl 

and from Proposition 1(b), 

limsup sup — j n (x,n,p) < jJ(x,tt). 

Combining these two inequalities, we conclude that 

limsup(l — (3)Vp(x, TT,p) < jJ(x, tt) for every p € Q(vf). 
p^i 

This in turn yields 

(4) limsup(l-/?)^(a;) <jJ{x,tt), 

p^l 

where Vp{x) is the upper value of functional cost (1), that is, 

Vp(x)=mt sup Vp(x,ir,p). 

Consequently, inequality (4) and assumption (G) together lead to the fol- 
lowing: 

(5) Vp{x) < +oo 

for each x £ X and (5 € (0, 1). In addition, Vp{x) > 0. Now defining 
p:= inf inf J(x,ir), mp := inf Va(x) 

X£X 7TGII x£X 

and observing that 

(6) limsup(l — f3)mp < 7p, 

one can deduce that there exists a sequence of discount factors {P n } con_ 
verging to 1 for which 

(7) lim (1 - /3 n )mp n = I, 

n — >oo 

where I is a certain nonnegative constant. 
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3. A solution to the auxiliary discounted minimax problem. The main 
thrust of this section is to solve the auxiliary discounted minimax problem 
introduced in the previous section. In other words, we look for a discounted 
functional equation whose solution is the function Vp . This is done by an ap- 
proximation of the above-mentioned minimax models by ones with bounded 
cost functions. These models in turn are solved by a fixed point argument in 
Proposition 1. Next, we show in Lemma 1 that the corresponding solutions 
equal the upper values of some discounted costs on the infinite horizon. Fi- 
nally, the limit passage in Lemma 2 gives the desired discounted functional 
equation with the function Vp as a solution. 

We shall need the following two sets of compactness-semicontinuity as- 
sumptions, which will be used alternatively. 

Condition (S). 

(i) The set A(x) is compact. 

(ii) For each x € X and every Borel set Del, the function q(D\x, •) is 
continuous on A{x). 

(iii) The cost function c(x, •) is lower semicontinuous for each ieX 

Condition (W). 

(i) The set A{x) is compact and the set-valued mapping x \— > A(x) is 
upper semicontinuous, that is, A(x) nB / 0} is closed for every 
closed set B in A. 

(ii) The transition law q is weakly continuous on K, that is, the function 



Jx 

is continuous function for each bounded continuous function u. 
(iii) The cost function c is lower semicontinuous on K. 

By L{,(X) and B^X), we denote the set of all bounded lower semicontin- 
uous and bounded Borel measurable functions on X, respectively. Further, 
let N stand for the set of positive integers. Choose N £ N and define the 
truncated cost function 



The following result was proved under Condition (W) for bounded cost 
functions by a fixed point argument; see page 72 in [10]. However, a simple 
and obvious modification of the proof gives the conclusion under Condition 
(S) as well. 




(x, a) G K, 



c (x, a) = min{iV, c(x, a)}. 
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Proposition 2. Under (W) [(S)], for any discount factor (3 G (0, 1) and 
number 
such that 



a number N G N, there exists a unique function w$ G L b (X) [w$ G B b {X)] 



(8) 



in? (x) 

e /3 w = mm 
a6A(a;) 



7C ^(x,a) [ e H^q(dy\x,a) 



for each x £ X, and 
(9) 



0<(l-P)wg(x)<N'y. 



Moreover, there exists a stationary strategy f° € F (possibly depending on j3 
and N) that attains the minimum in (8). 

Let f3 and iV be fixed just in the next lemma. 

Lemma 1. Assume (W) or (S). Then, it holds 

A' 

(9 



(10) 



inf sup y2£x P /3 k [-yc N (x k ,a k ) - R(p{-\x k ,a k )\\q(-\x k ,a k ))} 
^ u P eQ(ir) k=0 



for any initial state x 6 X. 

Proof. Note that (8) can be rewritten in the following equivalent form: 



(11) w^{x)= min 



7c (x, 



a) + log / e^ w P ^ y \(dy\x, a) 
Jx 



Applying Lemma A in the Appendix to (11), we get 



(12) 



with 



w N {x) 



mm sup 



jc N (x,a) - R(iJ,\\q(-\x,a)) + /3 / (y)n(dy) 



x 



A(x,a) := {n G Pr(X) :i?(/i||g(-|x,a)) < +oo}, (x,o) G if. 
Moreover, the measure 

e Pw " {v) q{dy\x,a) 



H°{dy)[x,a\ 



f x e^^q(dy\x,< 



achieves the supremum in (12). Put 

(13) p°(dy\x,a) = /jP(dy)[x, a] for each (x,a) G K. 
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Note that p° £ Q(tt) for any strategy it € II. This directly follows from the 
definition of R(p°(-\x,a)\\q(-\x,a)) and (9). Simple calculations give the up- 
per bound 

(•F,a)||?(-F,a)) <2^— -^exp (^^3^ J 

for every (x,a) 6 K. 

Let p° be defined as in (13). By (12), we then have 

w%(x)< 1 c N (x,a)-R(p°(-\x,a)\\q(-\x,a))+P f w% (y)p°(dy\x,a). 

By iteration of this inequality n times, it follows 



w^(x)<^p k £^\ 1C N (x k ,a k )-R(p (-\x k ,a k )\\q(-\x k ,a k ))} 



k=0 



+ P n+1 £^°w^(x n+1 ), 



where ir is any strategy for the controller. Now, letting n — > oo and making 
use of (9), we conclude 



oo 

N (x) < J2P k £7°hc N (x k ,a k ) -R(p°(-\x k ,a k )\\q(-\x k ,a k ))]. 

k=0 

Since ir is arbitrary, we get 

oo 

wg (x) < mf J2 P k £7° hc N (x k , a k ) - R(p°(- \x k , a k ) \\q(-\x k , a k ))] 

k=0 

oo 

(14) <inf sup Y,P k SZ P [yc? f (x k ,a k ) 

^ en P GQW k=0 

- R{p{-\x k ,a k )\\q{-\x k ,a k ))]. 

Note that inequality (14) is valid because p° € Q{tt). 
On the other hand, by (12), we can write 

w^(x)> 1 c N (x,f (x))-R(p(-\x,f (x))\\q(-\x,f (x))) 
+ (3 [ w$(y)p(dy\xj°(x)), 

with /° as in Proposition 2 and any p G Q(f )■ Proceeding along the same 
line, we infer 

oo 

w${x) > Y,P k£ £ P [jc N (x k J°(x k )) -R(p(.\x k ,f (x k ))\\q(-\x k ,f (x k )))]. 

k=0 
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Since p G Q(/°) is arbitrary, we easily deduce 



oo 



<(x)> sup Y,P k£ t P hc N (x k J°(x k )) 



peQ(/°)fc=o 



(15) 



R(p(-\x k ,f(x k ))\\q(-\x k ,f(x k )))} 




J?0(-|x jfc ,a fe )||?(-|x fe ,a fe ))]. 



Finally, combining (14) with (15) completes the proof. □ 

In the remainder of the paper, we shall use the following notation. Let 
L{X) denote the set of all lower semicontinuous functions on X, whereas 
B{X) stands for the set of all Borel measurable functions on X. 

Lemma 2. Let (W) [(S)] hold and (3 G (0, 1). Then, we have the follow- 
ing: 

(a) The function 



is finite and nonnegative for each x G X. Moreover, wp G L(X) [wp G B{X)\ . 
(b) The functional equation holds 



for all x G X. Furthermore, there exists a Borel measurable selector fp G F 
of the minima in (16). 

(c) For any wp{x) = Vp(x). 

Proof. Let x £ X and G (0, 1) be fixed. From (10), it is easily seen 
that the sequence {w^ (x)} is nondecreasing in N. Therefore, wp(x) = 
limjv^oo w^{x) exists and by (9), it is nonnegative. Clearly, under (S), 
wp G B(X), whereas, under (W), wp G L(X); see Proposition 10.1 in [26]. 

In order to prove that wp(x) is finite for each x G X, observe first that, 
for any n G II, p G Q(tt) and N G N, 



(x) : = lim Wa (x 



(16) 




Vp(x,n,p) = Y^P 



' k £x p hc{x k ,a k ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))] 



k=0 



oo 



>J2P k £7hc N (x k ,a k )-R(p(-\x k ,a k )\\q(-\x k ,a k ))]. 
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Moreover, from Lemma 1, we have 
Vp(x)=M sup Vp{x,Tr,p) 

pGQ(7r) 

oo 

> inf sup y2f3 k £x P [jc N (xk,a k ) - R(p(-\x k ,a k )\\q(-\x k ,a k ))} 

= wg(x). 
Hence, letting N — > oo, it follows 



(17) 



Vp(x) > lim (x)=wp(x). 



By (5), V^g(x) is finite for each x € X, so is wp(x). This finishes the proof of 
part (a). 

In order to prove part (b), note that by (11) and part (a) the limit 



(18) 



lim min 

N^oo a£A(x) 



jc N (x , a) + log / ^ q(dy\x,a) 



x 



exists. Since the first and the second term in (18) are nondecreasing and 
(W) or (S) holds, then we may interchange the limit with the minimum 
(see Proposition 10.1 in [26]). Furthermore, making use of the Lebesgue 
monotone convergence theorem, we conclude (16). The existence of a Borel 
measurable selector /^eF follows from the compactness-semicontinuity 
assumptions and Proposition D.5 in [17]. 

We now turn to proving part (c). Again, taking a logarithm on both sides 
of (16), it follows 



(19) 



wp(x) 



mm 

aeA(x) 



jc(x, a) + log / e^ w ^q{dy\ 
Jx 



x,a 



Applying Lemma A in the Appendix to (19), we easily obtain 

wp(x) 

(20) 



mm sup 

a&A(x) M gA(x,a) 



jc(x, a) - R(n\\q(-\x, a)) + (3 / wp{y)fi(dy) 



x 



with 



A(x, a) = {/j, € Pr(X) : R{^\\q{-\x, a)) < +oo}, (x, a) € K. 
Observe that by (20), for any p G Q(fp), the following holds: 

wp(x) > jc(x,fp(x)) - R(p(-\x, fp(x))\\q(-\x, fp(x))) 

+ P wp{y)p{dy\x,fp(x)). 
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Iterating this inequality n times, we immediately obtain 

wp{x)>Y,P k d pP bc{x k Jp{x k )) 

k=0 

- R{p{-\x kl f p {x k ))\\q{-\x k , fp{x k )))} 

(21) +(3 n+l £l f3P w p {x n+1 ) 

n 

>$> fc ^V(x fc ,//3(x fc )) 
k=0 

-R(p(-\x k Jp(x k ))\\q(-\x k ,fp(x k )))]. 

Now note that, by Definition 1, 

£x dP [lc(x k ,lB{x k )) - R{p{-\x k Jp{x k ))\\q{-\x k Jp{x k )))} > -C, 
for some C > and k > 1. Thus, letting n — > oo in (21), it follows 

oo 

wp(x) ^ k£ ^ P h< x kJ(3(xk)) ~ R{p^x k , fp(x k ))\\q(-\x k , fp(x k )))] 

k=0 

= Vp(x,fp,p). 
Since p ^Qi fp) is arbitrary, we see that 

(22) w p {x)> sup Vp(x,fp,p)>V p (x). 

Inequalities (17) and (22) combined conclude the proof of part (c). □ 

4. A solution to the risk-sensitive control problem. For any x £ X and 

any discount factor (3 S (0, 1), define 

hp(x) :=Vp(x) -rrifs 

with = infxGX Vp(x). Obviously, hp is nonnegative. 

The following boundedness assumption is supposed to hold true. As men- 
tioned in the Introduction, we put off discussing it until Section 5: 

Condition (B). For any x G X, sup^g ^ hp(x) < +oo. 

Remark 3. A similar assumption and its equivalent variants were used 
to study the expected average cost criterion for Markov decision processes 
in the risk-neutral setting [17, 27, 28]. Roughly speaking, Hernandez-Lerma 
and Lasserre [17], Schal [27], and Sennott [28] assume that the family of the 
so-called normalized /3-discounted cost functions is bounded. This assump- 
tion, however, simply holds for ergodic Markov decision processes. More 
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precisely, if the n-step transition probabilities converge to the unique in- 
variant probability measure geometrically fast, and the cost functions are 
bounded (or more generally satisfy a certain growth hypothesis), then the 
aforementioned family of functions is pointwise relatively compact [21, 22]. 
It is worth pointing out that this requirement is crucial to obtain the opti- 
mality inequality in the risk-neutral case; see [27, 28]. In Section 5 we provide 
an example that illustrates that also in the risk-sensitive case Condition (B) 
cannot be weakened. 

We shall need the following two versions of Fatou's lemma for converging 
measures. 

Lemma 3. Let {/U n } be a sequence of probability measures converging to 
jj, € Vi{X) and let {h n } be a sequence of measurable nonnegative functions 
on X. Then, 

/ h(y)K d y) < liminf / h n (y)n n (dy) 
Jx n ^°° Jx 

in the following cases: 

(a) {/i n } converges setwise to n [i.e., f x f(y) dfi n (y) -> f x f(y) dpt(y) V/ € 
B b (X)], and h(x) = liminf h n (x); 

(b) {Hn} converges weakly to /i, and h(x) = infjlimmfn^oo h n (x n ) :x n — > 
x}; moreover, h £ L(X). 

Proof. Part (a) is due to Royden [25], page 231, whereas part (b) was 
proved by Serfozo [29]. For the proof of lower semicontinuity of h, the reader 
is referred to Lemma 3.1 in [22]. □ 

Now we are in a position to state the main result of the paper. This theo- 
rem concerns a study of the risk-sensitive average cost optimality inequality, 
which is sufficient to establish the existence of an optimal stationary policy. 

Theorem 1. Assume (B) and (W) [or (S)]. Then, for each risk factor 
7 > 0, there exist a constant I and a nonnegative function h £ L(X) [h € 
B{X)} and f € F such that 

h(x) + I > min 

a€A(x) 

(23) 

= 7 c(x, /(*))+ log / e h ^q(dy\xj(x)) 
Jx 



jc(x, a) + log / e h ^q(dy\x,a) 



x 
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for all Moreover, 



— = inf J(x,tt) = J(x, 
7 wen v ' v 



In other words, Z/7 is the optimal risk-sensitive average cost and f is a 
risk-sensitive average cost optimal stationary policy. 

Remark 4. (a) There are two papers [16, 27] that can be treated as 
predecessors of our work. They both deal with the optimality inequality but 
within two different frameworks. The first work [16] establishes the optimal- 
ity equation for the risk-sensitive dynamic programming on a denumarable 
state space. In the other one, the result is obtained for Markov control pro- 
cesses on an uncountable state space for the risk factor 7 = 0. From this 
point of view, our result is an extention of Theorem 4.1 in [16] to a general 
state space and Theorem 3.8 in [27] to the risk-sensitive case. Moreover, the 
common feature of the discussed results is that their proofs are based on the 
vanishing discount factor approach. Our proof also relies on this method, and 
similarly, as in [27] or [21, 22], makes use of the Fatou lemmas for setwise 
and weakly convergent measures. 

(b) Finally, it is also worth mentioning that there are papers studying the 
optimality equation in the risk-sensitive dynamic programming, which is of 
the following form: 



The constant - is (under suitable assumptions) an optimal cost with respect 
to the risk-sensitive average cost criterion. Let us mention and discuss a few 
representative papers that deal with equation (24). In [8, 15] Markov control 
models satisfying a simultaneous Doeblin condition, on a finite and countable 
state space, respectively, are considered. The cost functions are supposed to 
be bounded and the risk factor must be sufficiently small. Otherwise, as 
argued in [8], the optimality equation need not have a solution. 

In [10] Di Masi and Stettner extend the result to a general state space 
by retaining bounded cost functions and replacing a simultaneous Doeblin 
condition with a very strong assumption on transition probabilities. In [11], 
however, they replace this assumption by one imposed on the risk coeffi- 
cient. Finally, the class of Markov control models that requires neither any 
ergodicity conditions nor the smallness of the risk factor was pointed out by 
Jaskiewicz in [20]. 

Fairly recently Borkar and Meyn [5] considered Markov decision processes 
with unbounded cost functions on a denumarable state space. Their result 



(24) 
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assumes the following: the state space is irreducible under all Markov poli- 
cies, the costs are norm-like, and there exists a policy that induces a finite 
average risk-sensitive cost. Moreover, their proof is based on a multiplicative 
ergodic theorem that was studied in more detail in [1]. 

Proof of Theorem 1. Let {/3 n } be a sequence of discount factors 
converging to 1 for which (7) holds. Defining 

l:=l= lim (1 - f3 n )mp n 
and applying (6), we note that 

(25) - < inf J(x,vr) 

j 7ren 

for any Assume for a while that inequality (23) is satisfied and there 

exists / £ F as in the statement of Theorem 1. We prove that / is an optimal 
policy. From (23), we have 

h(x)> 7 c(x,f(x))-l + log { e h ^q{dy\xj(x)). 

Jx 

By iteration of this inequality n times, we obtain 

h(x) >log££exp ^2jc(x k J(x k )) + h(x n+1 ) J - (n+Vjl 
\k=o / 
Since h is nonnegative, we infer 

K x ) ,f > Jn+i{xJ) 



n + 1 n + 1 

with J n+ i(x, f) defined in (3). Letting n— >oo, it follows 

(26) ->J(x,f), xeX. 

7 

Hence, (25) and (26) together imply 

- = J(x,f)= inf J(x,U) 

•J 7rGiI 

for each x £ X. 

We next focus on showing inequality (23). Let n > 1 and put h n := hp n , 
fn '■= fp n - Note that (19) can be rewritten in the following form: 

(1 - Pn)mp n + h n (x) = min 

aeA(x) 

(27) 



jc(x, a) + log / e Pnhn ^q{dy\x, a 
Jx 

jc(xj n (x))+ log [ e^ h ^q(dy\xj n (x)) 
Jx 
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(i) Assume first (S) and define 

h(x) = lim inf h n (x). 

n.^oo 

Taking the lim inf on both sides of (27), we get 
lim inf ((1 - /3„)m (3n +h n (x)) 



I + h(x) = lim inf min 

n-»oo a£A(x) 



7c(x,a) + log/ e /3n/ln(j/) g((i?/|x,a) 



A' 



Making use of Lemma 3(a) and the measurable selection theorem (see Propo- 
sition D.5(a) in [17]), one can prove that there exists / £ F such that (23) 
holds. 

(ii) Now assume (W). Fix xq £ X and choose any x n — > x$, n — > oo. Take 
a subsequence {n^} of positive integers such that 

lim inf h n (x n ) = lim /i nfc (x„ fc ) . 

Then by (27), 

liminf((l - f3 n )ma n + h 

n \%n ) ) 

n — >oo r 

= l + lim inf h n (x n ) =1+ lim K (x nk ) 

(28) 



lim min 

fc— >oo a€ A^nj, ) 



7c(x nfc ,a) + log / e^fc^^dylx^a) 



A 



lim 

fc— >oo 



lc{x nk ,f nk (x nk ))+\og / e^*^*Mg(dj/|z n|b ,/ nib (x n J) 



A 



Note that G = {xq} U {x n } is compact in X. From the upper semicontinu- 
ity of x i — ^ ^4(x), compactness of every ^4(z) and Berge's theorem (see [2] or 
Theorem 7.4.2 in [23]), it follows that [j z( z G A(z) is compact in A. There- 
fore, {fn k (xn k )} has a subsequence converging to some ao £ A. By (W)(i), 
ao £ ^4(xo), that is, (xo,ao) £ JT. Without loss of generality, assume that 
fn k (xn k ) — ► k ^ oo. By the lower semicontinuity of the cost function c 
and (28), we have 

r+liminf/i n (x n ) >7c(x ,a ) + lim lo g / e^^kiv) q (dy\x nk , f (a; )). 
This and Lemma 3(b) imply that 

r+liminf/i n (x n ) >7c(x ,a ) +log / e^g^dylxo, «o), 

n-fOO Jx 

where e' 1 is the generalized lim inf of the sequence e hk = e hn k . Clearly, h < h. 
By Lemma 3(b), h £ L(X). Thus, 

(29) r+liminf/i n (x n ) >7c(x ,a ) +log / e h ^q(dy\x , a ). 

n->oo J x 
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Since x n — > xo was chosen arbitrarily, we infer from (29) that 
T+ h(x ) > jc(x , a ) + log / e h ^q(dy\x , a Q ). 



x 



The last inequality shows that, for any idl, there exists an a x G A(x) such 
that 

T+ h(x) > ■yc(x, a x ) + log / e h ^q(dy\x, a x ) 

(30) 



> min 

a£A(x) 



7 c(x,a)+ / e h W(i/)g(di/|z,o) 



x 



By our compactness— semicontinuity assumptions and Proposition D.5(b) in 
[17], there exists some / G F such that (23) holds. □ 

5. A discussion. This section is devoted to a discussion of Condition (B). 
We start with revisiting Example 3.1 in [8]. 

Example 1. Put X = {0, 1}, A = {a}, c(x) := c(x, a) = x and the tran- 
sition matrix is as follows: 

"1 
p 1-p 

where p € (0, 1). Recall that the following was proved. 
Let us consider three cases for the risk factor 7: 

(I) 7 <-log(l-p), 
(II) 7 = -log(l-p), 
(III) 7>-l og (l-p). 

Then if (I) or (II) hold, the optimal risk-sensitive average cost equals 
and is independent of the initial state. In case (III) we have J* (0) = and 
J*(l) = 1 + log ^~ p ^ > 0. In addition, it is interesting to observe that, for 
(II) and (III) cases, there does not exist a function h : X 1— ► R such that 
optimality inequality (23) is satisfied. Indeed, to see this take x = 1 and 
consider (III). The optimality inequality is then as follows: 

7 J*(1) + h(l) = 7 + log(l -p) + h(l) > 7 + log(e^(l -p) + e h ^p). 

Note that the right-hand side is strictly greater than 7 + \og(e h<yl \l — p)), 
which equals to the left-hand side. Similar calculations for case (II) also 
lead to a contradiction. Hence, although an optimal cost is constant, the 
optimality inequality need not have a solution. 

Now we turn to checking Condition (B). Let Vr be as in Lemma 2. Clearly, 
Vp = w% for A7 > 1 and Vp(0) = 0. Then, by (8) under (I), we get 

Vpil) = 7 + log[eW)(l - p) + p] < 7 + log[eW(l -p) + p\. 
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Hence, 



^(ixiog( 1 :^ i -^ ) ) 6 (o,D, 

and consequently, sup^ g( - ^ hp(x) < +00. 

Now let the risk factor 7 be as in (III). Then by (8), 

Vfc(l)>7 + log(l-p) + /0^(l), 
which in turn implies that 

w)> 7±tog(W) 

Thus, hp(l) = Vp(l) goes to the infinity when (3 f 1. 
For case (II), we obtain 

Vp(l) = - log(l -p) + bg[c^»«(l -p)+ P ] 

(31) 



/3^(l)+log 



l + e -W)_£_ 
1-p 



If Vp(l) X +°° when /? /* 1, then the right-hand side of (31) also goes to the 
infinity. On the contrary, assume that sup^ e ( 1) Vp(l) < C for some constant 
C > 0. Then, 

, lW log[l + e- c p/(l-p)] 

W)> ^ , 

which leads to a contradiction when /3 /" 1. In consequence, in case (II) the 
family {hp(l)} does not satisfy Condition (B) either. 

Therefore, the following conclusion can be drawn. Condition (B) is nec- 
essary to obtain a solution to the optimality inequality. 

For a verification of Condition (B), one can use Lemma 4 below. For a 
similar result in the risk-neutral, case we refer to [27, 28]. For some 77 > 0, 
define the stopping time 

r = t((3) := inf{n > : Vp(x n ) <mp + 77}. 
Lemma 4. For r]>0, (3 G (0, 1) and x G X, 

hp{x) < rj+ inf log££exp ^7^,0^) . 

nen \k=o J 

Proof. By Lemma 2(b), (c) and the fact that Vp{y) > 0, y G X, we 
have 

Vp{x)= min 

a€A(x) I Jx 

(32) 



7c(x,a) + log/ e pv ^ y) q{dy\x,a) 
Jx 

< 7c(x, a) + log / e^^g(dy|a;, a) 
Jx 
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for each x € X. Subtracting rap from both sides in (32), we obtain 
Vfs(x) - rap <jc{x, a) + log f e {v ^ y) ~ m ^q(dy\x,a). 



Iteration of this inequality up to the stopping time r yields 

= r] + logE*exp hJ2c(x k ,a k )). 

\ k=0 / 

Since tt € II is an arbitrary policy, we easily get the conclusion. □ 

Note that the fact 

(33) El exp ic{x k ,a k ) J < +oo 

\fc=o / 

has the following interpretation: before the process will reach "good states," 
the incurred costs at "early stages" should not be too large. Indeed, let us 
define a set D as follows. We say that 

x 6 D iff Vp(x) < rap + rj 

for a certain 77 > 0. Clearly, Denote by tjj the first return time of 

the process, governed by f@, to set D. Certainly, if (33) holds with r := td, 
then Condition (B) is satisfied. 

In Example 1 we can take D = {0} and r] = 0, since Vg(0) < + 0. If 7 is 
as in (I), then (33) holds: 

^(gVw) _!^ (1 -, rV - T ^( T £e^L). 

In other cases (33) fails to hold and, in addition, the earlier calculations 
show that hp(l) = +00. 

Summing up, the presented example shows that, without Condition (B) 
imposed on the family of functions {hp(x)}, (5 £ (0,1), a solution to the 
optimality inequality need not exist, and moreover, the optimal risk-sensitive 
average cost may depend on the initial state. In view of the above discussion, 
Condition (B) is designed to prevent the accrual of infinite expected costs. 
Namely, the costs incurred at transient states, that may be occupied only 
at "early stages," have an important and definite influence on a long-run 
performance measure. Therefore, Condition (B) requires the model to be 
sort of communicating insofar as certain sets of "good states" to be reached 
sufficiently fast. Then, the optimal risk-sensitive average cost is constant and 
the optimality inequality takes place. In addition, it is worth mentioning that 
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the ergodicity itself of a Markov process/chain does not help so much as in 
the risk-neutral case. In other words, for an ergodic Markov chain, it may 
happen that the optimal risk-sensitive average cost depends on the initial 
state as in Example 1. Moreover, in this example one can even prove in 
a straightforward way that under case (I) [either under Condition (B) or 
for sufficiently small risk factors], the optimality equation (24) is satisfied. 
Therefore, it would be interesting to know whether Condition (B) (together 
with some compactness-continuity assumptions) is sufficient to obtain a 
solution to the optimality equation. There is a conjecture that, since in the 
risk-neutral case a counterpart of Condition (B) is not sufficient [7], neither 
is it in the risk-sensitive setting. But this question is beyond the scope of 
the paper and remains open. 

APPENDIX 

The lemma below establishes a variational formula for the logarithmic 
moment-generating function. The reader is referred to Theorem 4.5.1 and 
Proposition 1.4.2 in [12] for its proof. 

Lemma A. Let X be a Polish space, h a measurable function mapping 
on X into M., which is either bounded from below or bounded from above, 
and v a probability measure on X. 



(a) Then, we have the variational formula 

log / e h dv = sup I — R(n\\u) + / hd\x 

JX uGA \ JX 



IX ti£A\ JX 

where 

A = {p e Pr(X) : R(ji\\u) < +oo}. 

(b) Let fiQ denote the probability measure on X, which is (Aq <C v and 
satisfies 



dv f x e h dv 

Then, the supremum in the variational formula is attained uniquely at hq. 
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